Agent Based Pattern Recognition

نویسندگان

  • Radu D. Găceanu
  • Horia F. Pop
  • László Kozma
چکیده

data types (ADTs) [WB01] are used in software applications to model real world entities from the application domain. An ADT can be implemented using different data structures. The study of data structures and the algorithms that manipulate them is among the most fundamental topics in computer science [Mou01]. Most of what computer systems spend their time doing is storing, accessing, and manipulating data in one form or another. There are numerous examples from all areas of computer science where a relatively simple application of good data structure techniques resulted in massive savings in computation time and, hence, money. Let us consider that in a software application a Collection ADT (also known as Bag) is used. The main operations supported by a collection of elements are: insertion of an element into the collection, deletion of an element from the collection and searching an element in the collection. In order to better motivate our approach, we performed an experiment considering the List ADT and three data structures for implementing a List: vector (dynamic array), linked list and balanced search tree. The main operations supported by a list of elements are: insertion of an element into the list (at the beginning, at the end, at a certain position), deletion of an element from the list (a given element or from a given position), searching an element in the list, iterating through the list, accessing an element from the list at a certain position and updating an element from a certain position. 4.2 Automatic selection of data representations using ANN Data structures [WB01] provide means to customize an abstract data type according to a given usage scenario. The volume of the processed data and the data access flow in the software application influence the selection of the most appropriate data structure for implementing a certain abstract data type. During the execution of the software application, the data flow and volume is fluctuating due to external factors (such as user interaction), that is why the data structure selection has to be dynamically adapted to the software system’s execution context. This adaptation has to be made during the execution of the software application and it is hard or even impossible to predict by the software developer. Consequently, in our opinion, machine learning techniques would provide a better selection at runtime of the appropriate data structure for implementing a certain abstract data type. CHAPTER 4. SUPERVISED LEARNING IN SOFTWARE DEVELOPMENT 21 Artificial neural networks are emerging as the technology of choice for many applications, such as pattern recognition, speech recognition [SH07], prediction [LB05], system identification and control. We will use a feedforward neural network that will be trained using the backpropagation-momentum learning technique [RN02]. 4.3 Experimental evaluation In this section we aim at evaluating the accuracy of the technique proposed in Section 4.2, i.e. the ANN model’s prediction accuracy. As there is no publicly available case study for the problem of automatic selection of data representations, nor a case study in the related literature that can be reproduced, we consider our own case study. We describe in this section simulation results of applying our learning based approach to a selection problem that will be described below. Starting from the data set given at [For10], we have simulated an experiment for selecting the most appropriate data structure for implementing the List ADT. The considered data set consists of the results of a chemical analysis of wines grown in the same region in Italy but derived from different cultivars. The analysis determined the quantities of 13 constituents found in each types of wines [Win91]. The data set for evaluating the ANN classification model presented in Section 4.2 consists of (input, output) samples collected and pre-processed as we have described in Subsection 4.2.2. An input represents an execution context and the target output is the most suitable implementation for the List ADT (1, 2 or 3 according to the selected implementation). In our case study, as the instantiation of the List ADT occurs in the Wine class, an execution context will contain the values of the attributes of this class (13 attributes corresponding to the wine constituents described at [Win91]). The collected data set consists of 178 input-output samples and will be denoted by D. Considering the experimental results presented above, we can conclude that our approach provides optimized data structure selection and reduces the computational time by selecting the data structure implementation which provides a minimum overall complexity for the operations performed on a certain abstract data type on a given execution scenario. 4.4 Comparison to related work In this section we aim at providing a brief comparison of our approach with several existing approaches for the problem of automatic selection of data representations. To our knowledge, so far, there are no existing machine learning approaches for the considered problem, and, moreover, there are no publicly available case studies for it. 4.5 Automatic selection of data representations using SVM The design and implementation of efficient abstract data types are important issues for software developers. Selecting and creating the appropriate data structure for implementing an abstract data type is not a trivial problem for a software developer, as it is hard to anticipate all the use scenarios of the deployed application. Moreover, it is not clear how to select a good implementation for an abstract data type when access patterns to it are highly variant, or even unpredictable. The problem of automatic data structure selection is a complex one because each particular data structure is usually more efficient for some operations and less efficient for others, that is why a static analysis for choosing the best representation can be inappropriate, as the performed operations can not be statically predicted. Therefore, we propose a predictive model in which the software system learns to choose the appropriate data representation, at runtime, based on the effective data usage pattern. This paper describes a new attempt to use a Support Vector Machine model in order to dynamically select the most suitable representation for an aggregate according to the software system’s execution context. Computational experiments confirm a good performance of the proposed model and indicates the potential of our proposal. The advantages of our approach in comparison with similar approaches are also emphasized. CHAPTER 4. SUPERVISED LEARNING IN SOFTWARE DEVELOPMENT 22 The study of data structures and the algorithms that manipulate them is among the most fundamental topics in computer science [Mou01]. Most of what computer systems spend their time doing is storing, accessing, and manipulating data in one form or another. There are numerous examples from all areas of computer science where a relatively simple application of good data structure techniques resulted in massive savings in computation time and, hence, money. Software applications use abstract data types (ADTs) [WB01] to model real world entities from the application domain. An ADT can be implemented using different data structures. Let us consider that in a software application a Collection ADT (also known as Bag) is used. The main operations supported by a collection of elements are: insertion of an element into the collection, deletion of an element from the collection and searching an element in the collection. In order to better motivate our approach, we performed an experiment considering the List ADT and three data structures for implementing a List: vector (dynamic array), linked list and balanced search tree. The main operations supported by a list of elements are: insertion of an element into the list (at the beginning, at the end, at a certain position), deletion of an element from the list (a given element or from a given position), searching an element in the list, iterating through the list, accessing an element from the list at a certain position and updating an element from a certain position. In this section we present several existing approaches for the problem of automatic selection of data repesentations. To our knowledge, so far, there are no existing machine learning approaches for the considered problem, and, moreover, there are no publicly available case studies for it. Data structures [WB01] provide means to customize an abstract data type according to a given usage scenario. The volume of the processed data and the data access flow in the software application influence the selection of the most appropriate data structure for implementing a certain abstract data type. During the execution of the software application, the data flow and volume is fluctuating due to external factors (such as user interaction), that is why the data structure selection has to be dynamically adapted to the software system’s execution context. This adaptation has to be made during the execution of the software application and it is hard or even impossible to predict by the software developer. Consequently, in our opinion, machine learning techniques would provide a better selection at runtime of the appropriate data structure for implementing a certain abstract data type. First, the software system S is monitored during the execution of a set of scenarios that include the instantiation of the abstract data type T . The result of this supervision performed by a software developer is a set of execution contexts, as well as the type and the number of operations from O performed on T saved in a log file. The software developer will analyze the resulted log file and will decide, for each execution context (input) , the most suitable implementation for T given the execution context (output). This decision will be based on computing the global computational complexity of the operations performed on T during the scenario given by the execution context for each possible implementation of Di of T and then selecting the implementation that minimizes the overall complexity. SVMs use a technique known as the “kernel trick” to apply linear classification techniques to nonlinear classification problems. Using a Kernel function [Vap00], the data points from the input space are mapped into a higher dimensional space. Constructing (via the Kernel function) a separating hyperplane with maximum margin in the higher dimensional space yields a non-linear decision boundary in the input space separating the tuples of one class from another. In our current implementation, we have considered execution contexts of radius 0 (i.e. R = 0). This means that the execution context contains only the state of the object that uses the abstract data type T considered for optimisation. 4.6 Computational experiments In this section we aim at evaluating the accuracy of the technique proposed in Section 4.2, i.e. the SVM classification model’s prediction accuracy. As there is no publicly available case study for the problem of automatic selection of data representations, nor a case study in the related literature that can be reproduced, we consider our own case studies. We describe in this section simulation results of applying our classification approach to two selection problems that will be described in the following. CHAPTER 4. SUPERVISED LEARNING IN SOFTWARE DEVELOPMENT 23 Starting from the data set given at [For10], we have simulated an experiment for selecting the most appropriate data structure for implementing the List ADT. The considered data set consists of the results of a chemical analysis of wines grown in the same region in Italy but derived from different cultivars. The analysis determined the quantities of 13 constituents found in each types of wines. More details about this data set can be found at [Win91]. The data set for evaluating the SVM classification model presented in Section 4.2 consists of (input, output) samples collected and pre-processed. An input represents an execution context and the target output is the most suitable implementation for the List ADT (vector, linked list or balanced search tree) within the input execution context. The data set consists of 178 samples. An overall learning acuracy of 0.9625 was obtained. We will consider a real software system as a case study for evaluating the learning accuracy of the SVM. It is a DICOM (Digital Imaging and Communications in Medicine) [DICiM11] and HL7 (Health Level 7 ) [HL0] compliant PACS (Picture Archiving and Communications System) system, facilitating medical images management, offering quick access to radiological images, and making the diagnosing process easier. The analyzed application is a large distributed system, consisting of several subsystems in form of stand-alone and web-based applications. We have considered as our case study one of the subsystems from this application. The analyzed subsystem is a stand-alone Java application used by physicians in order to interpret radiological images. The application fetches clinical images from an image server (using DICOM protocol) or from the local file system (using DICOM files), displays them, and offers various tools to manage radiological images. We have used for evaluation a set of 96 image series samples which were obtained from publicly available DICOM image files [Osi10, RiplsDis10, cir10, oPDsf10, hp10]. The images are real images from real patients, but anonymized for confidentiality reasons. For managing the DICOM image files, an open source implementation of the DICOM standard was used [sciom11]. The results are stable, a standard deviation of 0.040024407 on the classification accuracies was obtained. The low value of the standard deviation indicates a good precision of the proposed approach. Considering the experimental results presented in Section 4.6, we can conclude that our approach provides optimized data structure selection and reduces the computational time by selecting the data structure implementation which provides a minimum overall complexity for the operations performed on a certain abstract data type on a given execution scenario. 4.7 Comparison to related work In this section we aim at providing a brief comparison of our approach with the existing approaches for the problem of automatic selection of data representations. 4.8 Conclusions and future work In this chapter we have presented our model for dynamically selecting the most suitable implementation of an abstract data type from a software application based on the system’s execution context. For predicting, at runtime, the most appropriate data representation, a neural network and a support vector machine classification model were used. We have also illustrated the accuracy of both proposed approaches on case studies. Considering the results presented in Section 4.3 and in Section 4.6, we can conclude that the approaches introduced in this paper for a dynamic selection of data representations have the following advantages: • They are general, as they can be used for determining the appropriate implementation for any abstract data type, and with arbitrary number of data structures that can be chosen for implementing the ADT. • They reduce the computational time by selecting the data structure implementation which provides a minimum overall complexity for the operations performed on a certain abstract data CHAPTER 4. SUPERVISED LEARNING IN SOFTWARE DEVELOPMENT 24 type on a given execution scenario. Consequently the efficiency of the software system during its evolution is increased. • They are is scalable, as even if the considered software system is large, the abstract data types are locally optimized, considering only the current execution context. The size of the execution context does not depend on the size of the software system as shown in the thesis. However, the main drawback of both approaches is that it is hard to supervise the learning process, as the supervision of an expert software developer is required for inspecting the collected execution contexts. Further work will be focused on: • Improving the proposed classification model by adding to it the capability to adapt itself using a feed-back received when inappropriate data representations are selected. • Applying other machine learning techniques [KTL11, ZDYZ11] , self-organizing feature maps [SK99], or other modelling techniques [RKY10, Ngu10, TD10] for solving the problem of automatic selection of data representations during the execution of a software system. • Studying the applicability of other learning techniques, like semi-supervised learning [ZL10] or reinforcement learning [SB98] in order to avoid as much as possible the supervision during the training process. • Evaluating our approach on other case studies and real software systems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Urban Vegetation Recognition Based on the Decision Level Fusion of Hyperspectral and Lidar Data

Introduction: Information about vegetation cover and their health has always been interesting to ecologists due to its importance in terms of habitat, energy production and other important characteristics of plants on the earth planet. Nowadays, developments in remote sensing technologies caused more remotely sensed data accessible to researchers. The combination of these data improves the obje...

متن کامل

Self - Adaptive Pattern Recognition Based on Multi - Agent

Pattern recognition consists in finding a correspondence between patterns and their prototypes. Intrinsically, it is a distributed process in terms of goals to be reached, zones to be processed and methods to be applied. In this paper, a multi-agent based self-adaptive pattern recognition framework is proposed to cope with the difficulties in the procedure. Each agent is dedicated to recognize ...

متن کامل

Local gradient pattern - A novel feature representation for facial expression recognition

Many researchers adopt Local Binary Pattern for pattern analysis. However, the long histogram created by Local Binary Pattern is not suitable for large-scale facial database. This paper presents a simple facial pattern descriptor for facial expression recognition. Local pattern is computed based on local gradient flow from one side to another side through the center pixel in a 3x3 pixels region...

متن کامل

Pattern Recognition in Control Chart Using Neural Network based on a New Statistical Feature

Today for the expedition of the identification and timely correction of process deviations, it is necessary to use advanced techniques to minimize the costs of production of defective products. In this way control charts as one of the important tools for the statistical process control in combination with modern tools such as artificial neural networks have been used. The artificial neural netw...

متن کامل

A New Statistical Approach for Recognizing and Classifying Patterns of Control Charts (RESEARCH NOTE)

Control chart pattern (CCP) recognition techniques are widely used to identify the potential process problems in modern industries. Recently, artificial neural network (ANN) –based techniques are very popular to recognize CCPs. However, finding the suitable architecture of an ANN-based CCP recognizer and its training process are time consuming and tedious. In addition, because of the black box ...

متن کامل

Agent-based artificial immune system approach for adaptive damage detection in monitoring networks

This paper presents an agent-based artificial immune system approach for adaptive damage detection in distributed monitoring networks. The presented approach establishes a new monitoring paradigm by embodying desirable immune attributes, such as adaptation, immune pattern recognition, and selforganization, into monitoring networks. In the artificial immune system-based paradigm, a group of auto...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012